Kyle Bao | 10 June 2020 | IBM Applied Data Science Capstone Project
Singapore is a multicultural city-state with a vibrant food scene. There are numerous restaurants serving cuisines from around the world, contributing to the city's high-quality diversified food scene. Due to the diversity of the food scene in Singapore, visitors may be spoilt for choices when deciding which eateries to visit. A map showing the distribution of high-quality restaurants and eateries in Singapore will be helpful in planning a culinary adventure through the city.
Instead of using data on the boundaries of neighborhoods in Singapore, we will use location data of the stations of the Mass Rapid Transit (MRT) train network. Not only does this serve as a proxy for neighborhoods, the resultant data product will also be more useful to a tourist visiting the city. Due to the cheap and efficient MRT train network, it is effortless for one to utilise the train network to plan one's itinerary and visit different parts of the city. Using the location data of the train stations, we will search for restaurants within walking distance of the station. The resultant map will be a map of high quality restaurants within walking distance of the train stations.
Location data for the train stations are obtained from the following Wikipedia page:
We will only use data for stations that are currently in operation.
The longitude and latitude coordinates for the stations will be obtained via the Nominatim geocoder for OpenStreetMap data from the geopy Python package.
The Foursquare API will be used to extract the list of restaurants around each station, as well as the restaurants' ratings and cuisine categories.
Finally, the map data used is provided by the folium package.
The raw list of Singapore MRT stations extracted from the Wikipedia page is messy and contains numerous irrelevant data such as subheadings, names of non-existing stations that are planned for future expansions, as well as duplicate entries. Consequently, I cleaned the data by firstly removing the subheadings. The data contains the opening dates of the stations. To drop all stations that are planned to be opened after 2020, I first have to extract the year from the date strings using regular expressions, convert them to numeric, and then simply apply a filter. Some station names contain translated names in Malay. These were also cleaned and only the English names of the stations remain.
Next, I used the Nominatim geocoder to obtain the latitude and longitude data for the individual stations.
To obtain the list of restaurants via the Foursquare API, the ?explore endpoint was used. An additional parameter I included in the query is §ion=food. Since we are only interested in restaurants and eateries, this additional parameter will ensure that all the reported venues from the API are eateries. This allows me to maximise the usage of valuable limited API calls. Additional details on the section parameter can be found on the official API documentations. The API also returns the detailed venue category of each venue, as well as its latitude and longitude.
Several stations have more restaurants and eateries within a 500m radius.
We can also list all the 106 different categories of restaurants and eateries in the data extracted from Foursquare.
I analysed the list of venues within 500m of a station, and collated them according to their venue category. This way, I can figure out what are the top few types of restaurants/eateries for each station.
By using K Means Clustering, I clustered all the stations into 10 different groups, depending on their similarity of each station's top 10 restaurants/eatery categories. The following map shows the visulised clusters according to the colours.
One of the goals of this project is to provide entrepreneurs a easy-to-use map visualisation of all the restaurants in the City. From our earlier data extraction, we already have a comprehensive data set of the restaurants and eateries within a 500m radius of all the MRT stations in Singapore.
An entrepreneur may wish to open a restaurant serving a certain type of cuisine, and may wish to know the concentration of similar restaurants/eateries in the City. Hence, I wrote a function which can be easily called with the right parameters to map out similar restaurants.
The map also displays the MRT stations as red circles, and the eateries as either blue circles in the non-clustered map, or as popup markers in the clustered version.
Let us see it in action.
Interestingly, there seems to be very little Indian restaurants near the MRT stations in the heartland estates in the North, the Northeast, and also all the way to the West. Unsurprisingly, there is a large congregation of Indian restaurants around the Little India MRT Station.
There are very few Mexican restuarants that are within 500m radius of a MRT train station. In fact, as seen from the map, our dataset shows that there are only 28 such restaurants. If demand can be ascertained, opening a Mexican restaurant can be venture to be considered.
There are numerous restaurants and eateries in Singapore serving all kinds of cuisines. However, as this analysis and visualisation has shown, the distribution of such eateries is not even. Furthermore, some cuisines have fewer restaurants, providing opportunity for new businesses. I hope that the tool in the accompaning notebook will help people in answering this business decision.
Further extensions include using a Premium Foursquare API to extract the ratings for all the restaurants in our dataset. This was not attempted in this exercise due to the prohibitive pricing.